ED 303 944 



DOCUMENT RESUME 



EC 212 141 



AUTHOR Benddrson, Albert, Ed. 

TITLE Testing, Equality, and Handicapped People. 

INSTITUTION Educational Testing Service, Princeton, N.J. 
PUB DATE 88 
NOTE 23p. 

AVAILABLE FROM FOCUS, Educational Testing Service, Princeton, NJ 
08541-0001. 

Reports - Descriptive (141) — Collected Works - 
Serials (022) 
Focus; v21 1988 

MFOl/PCOl Plus Postage. 

Admission Criteria; College Admission; *College 
Entrance Examinations; College Students; Comparative 
Analysis; *Comparative Testing; Difficulty Level; 
*DisalDilities; Factor Structure; Higher Education; 
♦Predictive Validity; Scaling; Selective Admission; 
♦Testing Problems; Test Items; Test Reliability; 
*Test Validity 



The scores of handicapped students taking tests such 
as the Scholastic Aptitude Test (SAT) or the Graduate Record 
Examinations are flagged so that admissions officers will be aware 
that they were achieved under special circumstances. A series of 
studies was initiated to determine whether special administrations of 
such tests are comparable to standard administrations, in which case 
flagging would no longer be necessary. The studies looked at 
comparability data for test takers with hearing impairments, visual 
impairments, physical handicaps, and learning disabilities. 
Comparability between standard and nonstandard test forms was found 
to be high, particularly with respect to characteristics as 
reliability, factor structure, and differential item difficulty. 
Analysis of the tests' predictive validity with regard to academic 
performance found that there was little over- or under-prediction for 
the great majority of handicapped students. The SAT did, however, 
substantially overpredict college performance for learning-disabled 
students, and this overprediction was exacerbated by time extensions 
during test administrations. The need for flagging test scores may be 
eliminated by establishing comparable timing criteria for special 
test administrations or by rescaling nonstandard test administrations 
according to hew handicapped students performed in school. The 
comparaMlity study also examined admissions decisions, test content, 
and testing accommodations. (JDD) 



PUB TYPE 

JOURNAL CIT 

EDRS PRICE 
DESCRIPTORS 



ABSTRACT 



******************** A** ******************************** ************** 



Reproductions supplied by EDRS are the best that can be made 
from the original document. 



\ 




Testing, 
Equality, and 
Handicapped 

People 



Standardized college admission tests 
were designed to provide a common 
yardstick for measuring the academic 
reasoning abilities of all students. 

The Scholastic Aptitude Test, for in- 
stance, broadened college admissions 
by enabUng students from any school 
anywhere in the country to demon- 
strate that they have academic poten- 
tial equalling that of affluent students 
from the most ehte prep schools. 
ERIC ^ 



Since ETS is committed to 
making its tests available to all 
students, it has traditionally mnd'* 
special provisions for those \Mth 
handicaps. Braille and uidio 
cassette versions are available for 
blind students Special facilities are 
provided for those with physical 
disabilities Extra time is provided 
for students with mipaired hearing 
and with learning disabilities 

Unfortunately, ETS has been 
unable to certify that test scores 
earned under such special condi- 
tions are completely comparable to 
those taken at regular administra- 
tions It has. therefore. traditionallN 
flagged the scores of handicapped 
test takers so that admissions 
officers will be aware that they 
were achieved under special circum- 
stances 

This practice, however, has long 
been the subject of considerable 
controversy. Advocacy groups for 
handicapped pef)ple have objected 
to flagging as a practice that identi- 
fies disabled individuals, making it 
easy to exclude them. The concern 
is that some colleges would prefer to 
exclude such students, thereby 
avoiding the expense of making 
special provisions for them. 

Admissions ofTicers, on the other 
hand, have argiied that flagging is 
necessary if the test scores are to be 
evaluated accurately. They point 
out that disabilities can affect 
college performance and. therefore, 
must be weighed in admissions 
decisions. 

ETS's continued use of flagging, 
however, has been based on its 
inability to guarantee the compara- 
bility of test .scores 



Sfictinn 504 

The passage of the federal Rehabili- 
tation Act of 1973 intensified the 
controversy. Section 504 under Title 
V extended civil rights protection to 
disabled people, establishing that 
they are to enjoy the same protec- 
tion from di.scrimination afforded to 
all other citizens. The wording of 
the 1977 regulations implementing 
the law mandated special test 
administrations for handicapped 
people while seemingly striking 
down the practice of flagging their 
scores 

One regulation, for instance, 
stipulates that an institution 
receiving federal funds must ensure 
that tests administered to handi- 
capped people reflect their aptitude 
or achievement levels rather than 
their impairments. 

Another regulation says that 
such institutions "may not make 
preadmission inquiry as to whether 
an applicant is a handicapped 
person." 

Flagging has been viewed by 
some as a violation of this second 
regulation because a flagged score 
report reveals that the test taker 
has a disability. Nevertheless, 
testing organizations have been 
reluctant to distribute scores 
achieved under special circum- 
stances without indicating that they 
might not be equivalent to the same 
scores achieved under standard 
testing conditions. They also feel 
constrained by professional stan- 
dards established by the American 
Psychological Association, the 
American Educational Research 
Association, and the National 
Council on Measurement in Educa- 
tion, recommending that users be 
cautious about nonstandard scores 
achieved when comparability is 
uncertain 



"Comparable scores do 
not necessarily imply the 
same average score for 
handicapped and non- 
handicapped groups..." 



EMC 



0 



To resolve thii^ dilemma, the 
National Academy of Sciences was 
asked to impane! a committee to 
reconcile the testing requirements 
of th • new law with .sound ps\cho- 
metri • practic(\ In its 1982 report, 
the \anel agreed that "current pM- 
chometric theory and practice do 
not allow full compliance with the 
rej,mlations as currently drafted." 

The panel recommended, there- 
fore, that a f()ur->ear stud\ he 
conducted to determine whether 
tests modified for handicapped te.st 
takers are comparahle to standard 
versions and whether the> pro\ i ie 
accurate estimates of the academic 
ability of students with dksabilitie.s 
If the predictive \alidit\ (or accu- 
racy) of both versions were found to 
be comparable, the panel su^^^a\sted. 
it would no longer be nece.s.sary to 
flag the scores of handicapped le.st 
takers. 

eg 

In response. Educational Testing 
Service, together with the College 
Board and the Graduate Record 
Examinations Board, initiated in 
1983 a series of pioneer studies to 
determine whether special admini- 
strations of the Scholastic Aptitude 
Test and the Graduate Record 
Examinations for handicapped stu- 
dents are comparable to standard 
administrations. 

The project culminated in March 
1988 with the publication by AUyn 
and Bacon. Inc.. of Testing Handi- 
capped People by Warren W 
Willingham. Marjorie Ragosta. 
Randy Elliot Bennett. Henry 
Braun, Donald A. Rock, and Donald 
E. Powers, The book provides, for 
the first time, answers to some of 
the most vexing questions sur- 
rounding the comparability of 
scores and offers a series of recom- 
mendations for the future 



The studies looked at compara- 
bility data for all four categories of 
disability* 

• hearing impairment, which 
ranges from hard of hearing to 
deafne.s.s. 

• visual impairment, which may 
range from a serious vi.sua! 
deficit to blindn e "^s ; 

• physical handicap, which in- 
cludes a variety of neurological 
and orthopedic disabilities: 

• learning dis ability, defined as a 
specific perceptual, neurological, 
or cognitive deficit identified 
mainly on the basis of school 
achievement 

The focus was on three questions 
for each of these groups, 

1 When admission tests are 
modified for handicapped 
people, to what extent are the 
nonstandard tests and the 
resulting scores comparable to 
those of the regular national 
program? 

2 Might the comparability of 
such tests be improved? If so. 
how'^ 

»3 What implications might be 
drawn for possible resolution 
of the flagging problem? 

Researchers were concerned with 
both score comparability and task 
comparability. If the scores of 
handicapped test takers are compa- 
rable, they w^iU reflect only aptitude 
or ability, rather than extraneous 
limitations or impairments. The 
test must measure the same factors 
as the standard examination, and it 
mus. predict college performance as 
accurately as the standard test. 

The fact that all scores studied 
were flagged complicated the 
research task because it was 
entirely possible that the flags 
themselves affected admissions 
decisions. If handicapped students 
were admitted on a fundamentally 
different basis than nonhandicap- 
ped students because of the flags, as 
cntics allege, the comparability of 
scores, particularly in terms of 
predictive validity, would be more 
difficult to determine 



ERIC 



6 



With respect to task comparabil- 
ity, the cognitive demands of the 
tv'^t must be shown lo be equivalent 
for hcUidicapped and nonhcUidicap- 
ped te^t takers The conleni must 
l)e comparable, no matter how it is 
presented; the iccommodation^ 
mu^t be i^ppropriate. and the tuning 
mu^t be equivalent, even if handi- 
capped .students are allowed addi- 
tu)n\l time to complete te>t que>- 

tlO'i^ 

WiUingham writes. "The matter 
often come.sdown. in the last 
anal\M.s, to a judgment about what 
i> reasonable and fair in testing 
people with a particular disabling 
condition " 

Researchers e>tab]i^hed an exhaus- 
tive sene> of criteria to be Used in 
determinmg whether special 
administrations of the Scholastic 
Aptitude Test (SAT) and Graduate 
Record Kxaminations i ORE ) w ere 
comparable to standard admuiist ra- 
tions According to Willi ngham, 
these included the performance of 
handicapped students on different 
types of test materials or formats, 
the performance in college of 
nonstandard test takers, evidence of 
the speed with which handicapped 
students complete tests, and the 
comparability of handicapped 
student accommodations in admis- 
sions testing to those used in college 
testing. 

Researchers tracked the perform- 
ance of students with each of the 
four handicapping conditions on 
most versions of the test.-, to detei*- 
mine how differing formats affected 
performance. The frequency w ith 
which different groups completed 
tests with different time limits wa.s 
tabulated Researchers were also 
concerned with the reliabilit\' or 



precision of test scores and Wi h 
whether specific te.^t items meas- 
ured the .same factors for handicap- 
peed and nonhandicapped students. 

Test results were tallied for most 
configurations The scores for 
\isual!\ impaired students, for 
instance, were tracked for regular- 
t\pe. large-type, and brajUe editions 
of the SAT Results were also 
compared to those achie\ed by 
regular students using .standard 
test forms. 

Sophisticated statistical meas- 
ures were applied to test questions 
to determine w hether they were 
measuring the same factors for all 
test populations 

Finally, SAT and GRE scores 
were cor/elated with first-year 
grades to determine whether special 
\ersions of tests administered to 
handicapped .students predicted 
their performance in college with 
accuracy comparable to that yielded 
by standard \ersions. 



E 



WiUingham cautions that differing 
versions of a test do not have to be 
identical for them to be comparable. 
"Comparable scores," he writes, "do 
not necessarily imply the same 
avei-age score for handicapped and 
nonhandicapped groups because 
there is no way to know whether 
ilv groups are either representative 
of students generally or comparable 
in their learning experience, 

"The important objective," he 
adds, "is to make the task as 
comparable as possible by removing 
irrelevant sources of difficulty." 

Three processes, he says, are 
involved in answering test ques- 
t ons — sensory-motor, encoding, 
and higher-le\e! cognitive proc- 




ERIC 



esses College admission tests arc 
designed to measure cognitive 
abilities, and intcrfep^ice arising 
from defects in other processes 
must he screened out The defeetn e 
sens()r\ -motor processes ofthose 
with physical handicaps and the 
limited encoding processes of blind 
and learning disabled students 
must in no \\a . affect test outeoiiK s 
if results are to he considered 
comparable 

The researchei s mensui ed 
c()mparabilit\ m eight dimeiision.s 
reliability, factor structure, differen- 
tial Item difTicultv. prediction of 
academic performance, admission - 
decisions, test content, testing 
accommodations, and te.st timing 

Overall, comparability between 
standard and nonstandard test 
forms was found to be high, particu- 
larly with respect to such internal 
characteristics as reliabil:tv. factor 
structure, and differential item 
functioning. Test results were not 
affected by the extraneous phy.sical 
limitations of handicapped test 
takers. 

Across the board, for instance, 
the tests were found to be highly 
comparable with respect to reliabil- 
ity — their measurements are 
equally precise for handicapped and 
nonhandicapped test takers. 

Willi ngh am points out. however, 
that it must also he demonstrated 
that the tests measure the same 
thing. Factor analysi.s ]> the statis- 
tical method used to make this 
determination. 

Except for the fact that xerbal 
and quantitative abilities were 
found to be less closely related for 
handicapped test takers, the factor 
analysis revealed test forms to be 
highly comparable, WiUingham 
writes. "The similarity in the tests' 
factor structure for handicapped 
and nonhandicapped exannnees 
supports the assumption that the 
nonstandard testscore.s represent 
comparable cognitive abilities and 
that they have not been distorted b\ 
the student's di.sabilit\ " 



Although nonstandard and 
standard tests were shown to 
measure comparable cognitive 
abilities, it remained possible that 
nonstandard versions might contain 
questions that were inappropriate 
because they were partieularh 
difficult only for disabled test 
takers A differential item function- 
ing anahsis was conducted to 
determine whether such bias 
existed, and except for a few ques- 
tions on the braille version of the 
mathematical portion of the SAT. 
httle e\idence of such questions 
surfaced 

With respect to test content, it 
would seem self-evident that tests 
delnered in standard and nonstan- 
dai'd adniinisti'ations must be 
comparable since the content is 
identical The issue, however, is not 
necessarily so easy to resolve, 
because identical questions might 
not be comparable if they are made 
mcjre difficult by the disabilities of 
some te.st takers 

Although such problems were 
found to be rare, the report ques- 
tions the comparahi v of SAT and 
GRE verbal questions for hearing- 
impaired students Those who have 
b( en deaf from birth have particu- 
lar difTiculties communicating in or 
understanding written English, 
which is a fundamentally different 
language from 'he .-.ign language 
they normally use to communicate. 
For instance, the various forms of 
sign language typically lack articles 
and prepositions, and their gram- 
matical structures aiffer radically 
from English. Students who have 
never heard English have an 
extremely difTicult time compre- 
hending its structure or meaning. 



ERIC 



8 



These difficulties were reflected 
in the average SAT \'erbal and 
mathematical ability scores for 
hearing-impaired >.udents, which 
were considerably lower than tho^e 
achie\ed b\ other handicapped 
groups The report >uggests that foi 
some deaf students, these low 
scores ma\ reflect the noncompcU'a- 
bility of test questions rendered 
unnecessariK difficult In the 
students' lack of p]ngli>li conrnuni- 
cation skills. Wilhngham poMit^ out 
that manually fluent student> 
tended to receu e the lowest scores 
He suggests, therefore, that a ,sign- 
language ver^^ion of the lest m.ght 
provide a more valid assessment of 
their skills and recommends that 
the feasibility of such a test should 
be examined 

An investigation of admissions 
decisions revealed that, contrar\ to 
the assumptions of thosc^ m some 
handicapped advocacy groups, the 
selection process for handicapped 
applicants w as generally compa- 
rable to that for nonhandicapped 
students Desp 'e flagged scores, 
admissions of handicapped and 
nonhandicapped students ahk'* 
increased in direct pioportior to 
increases in their high school 
grades and test .scores. The effect of 
flagging, therefore, seemed mini- 
mal 

A high percentage of disabled 
students expressed satisfaction 
with testing accommodations, with 



/ 



'I 



v 



94 percent of SAT test takers and 
86 percent of ORE test takers 
approving of testing ccmditions 

Kstabhshing the predictive \alidity 
of the tests with regard to the 
acadt>mic performance of disabled 
students was absolutely essential. 
Wilhngham (Explains, "The NAS 
panel viewed the accuracy of grade 
prediction^ as a crucial aspect of 
comparability, and with good 
reason . The \ alidity issue is 
w hether one can safely make the 
same inference as to future aca- 
demic performance w hen looking at 
test scores from nonstandard and 
regular administrations Do the 
nonstandard scores predict perform- 
ance accurately? Are they useful to 
the college and fair to the stu- 
dents*^" 

Test scores for handicapped 
students taking the SAT and ORE 
in a V "lety of configurations were 
comr ed to first-; ear grades in 
colu^e and graduate school. As with 
nonhandicapped :Uudents, the 
accuracy of predictions was en- 
hanced by combining test scores 
w ith high .school grades. The report 
concludes that "when academic 
perf()rmance was predicted on the 
basis of test scores and prior grades, 
there was little over- or under- 
predictum for the great majority of 
handicapped students." 




ERIC 



Willingham writes, "Thus is an 
important findmg because it indi- 
cates that if adniisMon^ officers 
follow the standard advice and 
usual practice of u^in^^ ^M-ade> a> 
well as test sor \s in estimating' 
future performance, these estimates 
will not. on avera^^e. he either too 
hi^h or t(M) low/' 

\Vh(»n looking at subgroups of 
test takers, rather than aggregate 
groups, however, researchers found 
that the academic performance^ of 
handicapped students was less 
predictable than that of their 
nonhandicapped ilassmates This 
finding held up whether test scores, 
grades, or both were used as predic- 
tive mstruments. and it was appli- 
cable at both undergraduate and 
graduate le\els. 

"If you break down the group into 
high and low scorers on an\ predic- 
tive measure.** says Willingham. 
"the handicapped students were 
less predictable. Those who score 
quite high on the test do worse in 
school than you would expect, and 
those who .^core low do better ** 

Willingham attributes some vif 
this lower predictahilitN to vari- 
ations in the quality of educational 
progi'ams for the disabled and to 
outside factors, such as financial 
problems or lack of support pro- 
grams, that have a particular 
impact on those with handicaps. 

Results also varied for different 
handicaps For mstance. test scores 
substantially underpredicted college 
grades for hearing-impaired stu- 
dents enrolled in college programs 
that provided special services for 
them. That is. grades in these 
programs were higher than test 
scores predicted. (When the college 
performance of hearing-impaired 
students in regular college pro- 
grams was predicted on the basis of 
tests and grades, accuracy was 
high.) 

On the other hand, the SAT 
ovei predicted college performance 
for both the physically handicapped 
and the learning disabled. For the 
physically handicapped, the over- 
prediction was not very laige. but 
for the learning disabled, the degree 
of overprediction was substantial. 
Although these students were not 
significantly overpredicted when 



grades were added to test scores, 
because high school grades were 
significantly lower, the result was 
still troubling 

Wilhngham points out an inher- 
ent problem in establishing the 
preduiive \alidity of scoies for 
h^arning disabled students People 
are identified as learning disabled 
precisely l)ecause ihvw achievenieiit 
does not measure up to their test 
scores. One of the primary criteria 
for distinguishing learning-disabled 
from slow learners is precisely the 
fiict that while the\ do well on tests 
of ability and have average to 
aho\e-a\erage LQ s. their academic 
performance does not measure up to 
these test results 

In contrast to disabilities arising 
from physical deficiencies and 
readily apparent to observers, 
learning disabilities are primarily 
academic disabilities. They are 
defined by poor academic perform- 
ance and cover a wide range of 
conditions including dyslexia, 
perceptual handicaps, and minimal 
brain dysfunction Such conditions 
are not readily apparent to observ- 
ers, and diagnosis can be highly 
subjective. 

Thc^ Education for All Handi- 
capp 'd Children Act of 1975 defined 
learmng disability as "a disorder in 
one or more of the basic psychologi- 
cal processes involved in under- 
standing or using language, spoken 
or written, which may manifest 
Itself m an imperfect ability to 
listen, think, speak, read, write, 
spell, or do mathematical calcula- 
tions.** The definition specifically 
excludes visual, hearii.g, or motor 
handicaps, mental retardation, and 
the effects of envuonmental. cul- 
tural, or economic disadvanta^^e. 

One of the most common learning 
disabilities -s dyslexia, a condition 
characterized by impaired ability to 
read. Dyslpxics may transpose 
letters in words, mistake one word 
for another, skip words or lines 
entirely, or have difficulty sounding 
out words 



ERLC 

1 u 




Learning d^sabilitie> ma\ also 
i.icliide other lan^iia^e pr()ce>MnK 
problems such aN short-term mem- 
ory deficits that render readers 
incapable of remembering what 
they have just rf^ad and or^i;mza- 
tional deficiencies that make it 
impossible \\ r readers to di.stin- 
guish main idea.s fi-om supporting 
evidence Although many of these 
problems seem to be plnsioloi^icallx 
based, the exact mechanisms at 
work are undea* pending fii, t'ler 
research. 

Willin^ham points out thai 
^ederrl ref^ulations specify that 
learning di>abled students he 
identified on ihe basis of poor school 
perform-uice in relation to abilit\ 
"It would seem to make httle 



sense." he \\ rites, "to evaluate score 
com para hi hty for this group on the 
ba>is of o\'er- oi* underprediction 
ifrom test scores alone) when a 
discrepancy in the test-school 
achievement relationsh p is pre- 
cisely the basis upon which the 
group IS identifiedl" 

Two factoi's seem to underlie the 
i)\ erpredictn e scores of learning- 
disabled students The first is the 
imprecise definition of learning 
disabilities, and the second is the 
time allowed to complete the test. 

The numbers of people in this 
categorv have grown tremendously 
in recent \ears. and today more 
than 1 H milhon pupils are identi- 
fied as learning disabled. There is 
.strong evidence that some of this 
growth reflects the tendency of 
some schools to place many stu- 
dents without physiologically-based 
learning problems into this category 
inappropriately 



ERLC 



**There's a lot of social and 
educational cleavage on how to viev 
all this." says Willingham. "When 
we see enormous increases in the 
numbers of people identified a> 
learning disabled, many >u>pect 
some kind ofedutational game- 
playing. Some suspect that schools 
mi^jht funnel more people into this 
categ()r\ to attract more program 
funds, rather than maintaining an 
accurate scientitlc or ^educational 
categorization Whereas a lot of 
people used to be labeled mentalK 
retarded, now fewer aie placed in 
that category and more are labeled 
learning disabled. Since learning 
disabilities are not clearly 'abeled 
sensory deficits, thi> makes people 
skepMcal and arouses controversx " 

In 1982, L()»-rieA Shepard and 
Mary ^ ee Smith, both from the 
University of (^)lorad() Department 
of Education, conducted a study of 
learning-disabihtx placements in 
the State of Colorado, The results 
were summarized \n their Spring 
1983 Lcarninfi Discibihfy Quarterly 
article "An Evaluation of the 
Identification of Learning Disabled 
Students in Colorado " They w rote, 
"Approximately 60 peirent of the 
pupils currentl> identified a> LD do 
not match the legal definitions or 
the definitions presented in the 
professional literature " 

Shepard and Smith found that 
many students had been inappro- 
priately placed in the learning- 
disabled category. They found that 
26,8 percent had been placed in 
learning-disabled da^^e^ without 
any I Q test data. 28.5 percent had 
LQ.s below 90, and 8 3 percent had 
I.Qs below 80 



Among the student popul?tion 
identified by the schools as learning 
dt..abled. only 43 percent demon- 
strated actual learning dhsabilities. 
The remainder included slow 
learner^, emotionally disturbed 
>tudent>. and non-native English 
speakers. 

If ShepardV findings hold true 
for the nation a> a whole, they 
imply that score.s on the special 
administrations of the SAT for the 
learning disabled may he artificially 
infiated hen additional time is 
provided to students who are 
inappr()priate!\ placed in that 
category 

In fact, the lesearcher.s found that 
the tendency of standardized tests 
to overpredict academic perfor- 
mance for the learning disabled was 
exacerbated by time extensions, 
WiUingham reports that providing 
extended time to learning-disabled 
test takers "may rai.^e scores 
beyond the l?vel appiopriate to 
compensate for the disability " 

Learning-disabled students are 
currently allowed up to 12 hours to 
complete the SAT. a virtually 
unlimited block of time. Learning- 
disabled students who took the 
most time to complete the test 
earned the scores that most seri- 
ously overestimated college per- 
formance. These results have raised 
serious question.-; about whether or 
not learning-disabled students 
should be granted additional time 
to complete exams and. if so. how 
much extra time should be granted, 

(lenerally. all handicapped 
students, regardless of disability, 
achieved higher scores when they 
took additional time on the exam. 
Increases ranged from 30 points for 
the hearing impaired to 38 for the 
learning disabled. Taking additional 
time also increased the chances of 
reaching late items and answering 
them correctK, even though the 
final Items tend to be the most 
difficult 




However, the first-year college 
performance of learning-disabled 
students, unlike other groups, was 
significantly overpredicted whe.i 
♦hey took additional time, and the 
degree of overprediction iiicreai^^ed 
the more time they took. Willing- 
hain writes, "This appears to be 
direct evidence that the SAT scores 
of ^ ^ students who took lonper 



amounts of time on the test were 
somewhat inflated." 

There were also lesser indica- 
tions that timing may inflate scores 
for physically handicapped and 
hearing impaired test takers, but in 
both cases the effects were minimal 
or vitiated by other fa^tois. None- 
theless, these findings led Willing- 
ham to conclude that timing repre- 
sented the only aspect of nonstan- 
dard test administrations that was 
not comparable to standard admini- 



ERLC 



to 




strations. He says that scores are 
raised at least somewhat beyond 
the level that would be achieved 
with comparable time, although the 
problem is acute only with respect 
to test takers who are learning dis- 
ables. 

These results have led some to 
question the philosophaa^ basis for 
allowing learnmg-disabled students 
to have large amounts of additional 
time to complete a test such as the 
SAT or t>eGRE. 



Marjorie Ragosta says that be- 
cause learning-disabled students 
achieve relatively low scores on 
college admission tests, research 
must be done to determme whether 
these scores are due to inaccurate 
measurement or are an accurate re- 
flection of the students' achieve- 
ment levels. "If the differential 
performance is not caused by meas- 
urement inaccuracies," she says, 
' does it make sense to turn around 
and say that you have to adjust the 
methods of testing or give a differ- 
ent test because it reflects differen- 
tial performance? Is it realistic to 
try to test people with a learning 
disability as if they don't have it?'' 

Ragosta speculates that over- 
prediction arises on the SAT be- 
cause aome learning-disabled 
students — particularly those who 
take seven or eight houis extra — 
are receiving relatively m'^re time 
for the test than they can continu- 
ally give to their college assign- 
ments. "What we are doing, in 
effect, for one little part of this 
individuals life, is to allow unlim- 
ited time that is not feasible every- 
where," she says. 

Willingham says, "To label 
students learning disabled and say 
they should have more time on the 
test because they don't do well at 
tests seems to stand the argument 
on its head." 

Randy Bennett suggests, how- 
ever, that overprediction may also 
bi caused by the fact that it takes 
learning-disabled students longer to 
become oriented to college. Migh 
school special education programs 
are highly structured, while college 
is not. Learning-disabled students, 
he argues, might have more diffi- 
culty adjusting in the first year and 
then do better m subsequent years. 

He also suggests that the over- 
prediction reported in the study 
may not hold for those attending 
colleges with special programs for 



learning-disabled students. As indi- 
cated earlier, test scores, in fact, 
underpredict *he college grades of 
learning-disabiod cudents in 
special prngrams. Such prograins 
provide help with study skills and 
often permit student^ to take a 
lighter course load each semester 
This, in effect, allows them to 
devote additional time to each 
subject. He points out that in t)ie 
last few years the number of these 
programs has increased signifi- 
cantly, so that even the most 
prestigious schools, such as Dart- 
mouth and Brown, now have suecial 
programs for learning-disabled 
students. 

"If we donl give extra time to 
students who will be in special 
programs m college," says Bennett, 
"the? test will be just as invalid as if 
we provide extra time to students 
who receive no extra help in col- 
lege/' 

Sally Shay wit z of the Yale 
Medical School says that the need 
for extra time is fundamental to the 
definition of learning disabilities 
"Learning-disabled students don't 
need extra programs, but they need 
more time to process mformation 
and get it on paper." she says "The 



whole discrepancy is between their 
intelligence and what they can do in 
a given amount of time For learn- 
ing-disabled students, extra time is 
absolutely crucial. To deny it is the 
kiss of death. 

"If \ )u don't know what kind of 
lUTommodiUion the kids had at 
college, you can't determine the 
predictive validity of the test." she 
add^. 

WiUingham. however, disputes the 
notion that all learning-disabled 
students need extended time on 
admission tests. He suggests that 
ETS data indicate that only seri- 
ously disabled students need extra 
time to obtain a score that has 
predictive validity. For most learn- 
ing-disabled students, he says, 
extra time merely leads to an 
inflated prediction of college per- 
formance. 

"These results," he writes, 
"suggest that testing programs need 
to reevaluate their policies regard- 
ing extended time for LD students, 
especially as to how much time 
should be allowed and whether it is 
possible to improve present prac- 
tices concerning eligibility for the 
nonstandard examination." 



"For learning-disabled 
students, extra time is 
absolutely critical. To 
deny it is the kiss of 
death." 



ERIC 




RaKosta suggests that standards 
be tightened so that only student 
who can demonstrate a history of 
accommodations for a learning 
disability throughout their educa- 
tion be permitted to iiave extra time 
on the SAT or GRE. Currently, 
students can qualify for a special 
administration by presenting two 
pieces of documentation from 
experts in learning disabilities or 
evidence of an Individualized 
Education Program. All special 
education students in elementars 
and secondary schools are supposed 
to have such a program netting 
individual f iucational goals. It is 
developed by a committee of school 
officials ( including a teacher) in 
cooperation with the child s parents 
and should reflect a realistic assess- 
ment of what the child can learn 
and what kind of special help will 
be needed. 

Ragosta expresses skepticism 
about some expert documentation 
and suggests that Individualized 
Education Programs should be the 
primary qualification for extra help 
for people in public school syslerr.s. 
Assistance on the test would reflect 
the assistance received at schov)l 



Foi' instance, \isually handicapped 
students using large-pnr.* texts in 
school would receive large-print 
tests Similarly learning disabled 
students receiving extra time to do 
school assigimients would be 
allowed extra time to complete the 
SAT or GRE. 

"I don't think it's fair for the 
testing company to assume the 
entire burden of deciding how 
students will be tested." she says. 
"We have only one encounter with 
the individual and don't really know 
what his or ht disability is. If the 
Individual Ed^>< .\cr Program says 
thatastuden i dta.- the test 
onl\ with un^ in -> .nne. > *hiiik 
that's a gooc^ .rrL.,i f\ i.at the 
proper acc . <iV'' <m ..ns L /uldbe." 

In lieu of'j.w . ,^ent qualify- 
ing criteri \ ■ lal administra- 
tions, t^e lepc recommends that 
tune I'niits be established for all 
handiciipped groups comparable to 
the time limits imposed on nonhan- 
dieapped test takers Since the SAT 
and GRK have traditionally set 



time limits deemed adequate for 80 
percent of test takers to answer all 
questions, it has been recommended 
that the same .tandard be estab- 
lished for handicapped test takcM's. 

In order to carry out this recom- 
mendation. Ragosta has embarked 
upon a study to determme the tim(^ 
it takes 80 percent of those m each 
handicapped category to complete 
the SAT and the GRE. Presumably, 
this will make the timing on special 
administrations more comparable to 
that on regular administrations and 
will help to alleviate the overpredic- 
tion problem 

In order to determine these time 
limits. Ragosta will review timing 
records coinpiled by the various 
handicapped groups during past 
years. 

"When we provide 12 hours, 
which is a virtually unlimited 
amount of time, we are saying that 
every handicapped student should 
have a chance to finish the exam." 
says Ragosta. 'There should be at 
least some equality to be fair to the 



population at large. If we can 
determine the time that allows 80 
percent of those with a disability to 
finish, that should provide a cut 
time comparable to that given the 
general population. 

"Of course, we want to leave a 
loophole for those for whom the 
severity of disability precludes 
meeting this standard." 

In order to strengthen knowledge 
of predictive validity, Ragosta and 
ETS research scientist Henry 
Braun have also initiated a study to 
compare handicapped students' test 
scores to four-year, rather than 
first-year, college performance. It is 
hoped that the study will yield a 
more solid estimate of the predic- 
tive validity of special administra- 
tions. The researchers will also 
attempt to discern whether handi- 
capped students generally take 
longer to complete college than do 
nonhandicapped students, an issue 
they consider relevant to the 
granting of additional time on 
standardized tests. 




ERLC 



Bennett expresses the hope that 
the study of four-year prepress in 
college will also reflect the impact of 
special programs for the learning 
disabled recently instituted at many 
colleges. 

In addition, Bennett is also 
conducting a study of item bias Tor 
students with visual handicaps on 
the SAT mathemat cs section. He is 
attempting to discover which types 
of items don't work for blind stu- 
dents. Preliminary results indicate 
that the abilities of blind students 
cannot be tested accurately by 
items containing drawings and 
small diagrams and by those that 
ask test takers to estimate solutions 
based on visual material. 



R 



It s possible that establishing com- 
parable timing criteria for special 
test administrations will help solve 
the ongoing dispute over the prac- 
tice of flagging test scores. Handi- 
capped advocacy groups have long 
pressed for an en.l to flagging 
because they fear that the practice 
provides an easy method for spot- 
ting and rejecting the applications 
of handicapped students. Until now, 
ETS has flagged handicapped stu- 
dents' test scores because it could 
not guarantee their comparability 
and therefore was bound by estab- 
lished professional standards. 

Now that the issue of comparabil- 
ity has been thoroughly examined, 
and only relatively limited areas of 
noncomparability have been found 
to exist, an end to the practice of 
flagging scores seems within sight. 
Establishing comparable timing 
limits for special test administra- 
tions would go a long way towards 



solving the problem. Another 
possibility, also investigated, would 
be to rescale the nonstandcird 
administrations according to how 
handicapped students performed m 
college and graduate school. Rescal- 
ing might also eliminate the need 
for flagging, perhaps without 
limiting test-taking time. 

Rescaling wa.> first suggested by 
the National Academy of Sciences 
panel It proposed that scores of 
handicapped test takers could be 
made to predict college performance 
with the same degree of accuracy as 
those of nonhandicapped students 
by adjusting the scores according to 
sOiHe son uf statistical formula. For 
instance, scores might be adjusted 
according to how handicapped 
students performed in school so 
that an 800, for instance, would 
represent the highest level of work 
students with a particular handicap 
accomplish in higher education, 

Donald E. Powers and Willing- 
ham conducted an extensive study 
of the rescaling proposal. Powers 
says, 'Th'? proposal was that you 
could make the scores cf handi- 
capped and nonhandicapped test 
takers comparable by looking at 
how both types of students perform 
during the first year of school, 
taking the first-year grade-point 
average as a common link, and then 
adjusting test scores to obtain a 
comparable prediction of first-year 
performance . It would entail adding 
a constant to modify the scores of 
handicapped students. It seemed 
like a proposal worthy of consi- 
deration." 



ERIC 



18 



Ultimately, however, the re- 
searchers rejected that prop sal 
They found that it was not possible, 
given the limited number of people 
with various decrees of handicaps, 
to collect a lar^e enough pool of 
data upon which to ba.se scaling de- 
cisions. The problem was particu- 
larly acute for the ORE General 
Test, which is taken by far fewer 
students than the SAT 

Moreover, they concluded that 
grade-point a\ erases would not 
provide a sufficiently reliable and 
comparable criterion for rescaling a 
test. Thi» standards for grades \'ary 
widely at different colleges and this 
variation may be exaggerated by 
differences in the evaluatioi. of 
handicapped and nonhandicapped 
students. There would be no assur- 
ance that ^rade-point averages as a 
criterion would be comparable for 
handicapped and nonhandicapped 
students, and finding national 
points of reference would be virtu- 
ally impossible. 

Willingham and Powers also 
found that adding a constant to the 
scores of handicapped students 
would not result in adequate scaling 
due to complitaied variations in the 
predictive validitv of scores for 
handicapped students at different 



ability levels. Several adjusted 
scores might be necessary for 
handicapped students, and the 
researchers point out that the 
prot-ess would be so apparent that it 
would be tantamount to flagging. 

(/ther. nontechnical problems 
al.so surfaced when the rescaling 
propo.sal was examined. It might be 
argued that if scores were to be 
rescaled for one subgroup, why not 
for all''' Racial and ethnic minorities 
and women, for instance, might also 
demand that their scores be re- 
scaled so that differences in predic- 
tive validity, if any, between these 
groups and White males will be 
eliminated 

Powers also points out that 
rescaling might actually harm, 
rather than help, some groups, such 
as the learning disabled. He says, 
"It looked like what we would have 
to do to adjust scores would hurt 
learning-disabled students because 
their test scores tend to overpredict 
grades. Downward adjustments 
would be hard to defend, especially 
in light of sparse data and the 
resultant shaky statistics." 

He concludes, "Although, in prin- 
ciple, rescaling seemed not to be 
unreasonable, the more we looked 
at the data, the less technically 
feasible it seemed. It had a definite 
potential for adding inaccuracies to 
the system and would potentially do 
more harm than good. We concluded 
that rescaling was not a feasible 
w ay to get out of the flagging di- 
lemma " 

The unacceptability of rescaling 
as an alternative to flagging leaves 



establishment of comparable time 
limits for handicapped students as 
the best hope for eliminating^ the 
practice while assurin^^ admi.s^ion.s 
officers that the resultant scort s 
will be comparable 

"When timing is comparable for 
disabled test takers." say.s Willing- 
ham. "they will be as likely, on aver- 
age, to finish the test as nondis- 
abled candidates. We now know 
that standard and nonstandard test 
administrations are comparable 
except for timing. In theory this i.s 
correctable. We should gather the 
data to make timing comparable 
and then take the flags off the 
scores." 



Testing Handicapped People con- 
cludes with a number of additional 
recommendations for improving 




testing .serviee.s for handicapped 
people. Key recommendations 
include: 

• Routinely checking all form.s of 
the SAT and (]RP] for item.s that 
mav be differentially difficult for 
handicapped test takers These 
may include items that require 
visual or hearing experu^nce to 
understand 

• Insuring that test-familiarization 
and practice materials a? » 
a\ailable in braille, large type, 
and audio cassette formats 

• Identifying types of mathematics 
items that cau.se particular 
difficulty for blind siudenth. 
Experience indicates that items 
involving three-dimensional 
figures may be inordinately diffi- 
cult. As described earlier, a study 
of this problem is currently under 
way, 

• Examining the possibility of 
translating admission tests into 
American Sign Language. This 
will be more faii to some hearing- 
impaired students, particularly 
those in colleges where cla.^sroom 
lectures are signed. 

• Developing guidelines for ensur- 
ing comparability in small 
testing programs. 

• Providing admission:^ officers 
with better information than that 
currently available on important 
characteristics to look for in iden- 
tifying disabled students with 




ERIC 



20 



academic promise. Smce the 
college performance of students 
with disabilities is less predict- 
able from test scores and previ- 
ous gi-ades. special materials on 
disabilities and score interpreta- 
tion might be provided to admis- 
sions officers. They should be 
cautioned to give less weight to 
traditional predictors and to take 
special care to review the back- 
ground and personal ^^haracteris- 
tics of handicapped applu ants. 
Developing better means of 
asse.ssing the educational needs 
of handicapped students and 
monitoring their progress 
Computer-based programs b.nng 
developed by the College Board 
and ETS to diagnose learning 
problems in mathemai'iC^. 
writing, reading, and study skills 
might be adapted for handi- 
caoped students 



Generally, the results of the 
research repr)rtedin Testing Handh 
copped People have been encourag- 
ing. Except for the timing problem, 
the comparabihtv of results on 
nonstandard test administrations is 
stiong Efforts are currently under 
way to solve the timing problem 
and allow flagging to be eliminated. 

The fact remains, however, that 
there may never be a foolproof way 
to completely disentangle the effects 
of some disabilities from the assess- 
ment of verbal and quantitative 
reasoning skills by admission tests. 
Some disabled students, therefore, 
will always score lower on these 
measures as a result of their handi- 
caps. The crucial question is how 
accurately these scores predict 
college performance and how 
comparable they are to scores 
attained by nonhandicapped stu- 
dents. The research described in 
Testing Handicapped People sug- 
gests tnat the comparability of stan- 
dardized test results, for most 
handicapped test takers, remains 
remarkably high and will be im- 
proved in the future. 



lar.rimark CoMfM?- 

ETS researchers, in 
Testing Handicapped 
People assert that 
special administra- 
tions of the SAT for 
learning-disabled stu- 
dents ovorpredict 
their performance in 
college in part be- 
cause they are not 
allowed extra tmie to 
complete coUe^a^ as- 
signments compa- 
rable to the extra 
time they receive on 
the test. Thus, then- 
college grades suffer 
in comparison to then- 
test scores. 

There is, howex er. 
one college where the 
entire curriculum is 
designed for students 
with the most com- 



mon learning disabil- 
ity — dyslexia. Just 
as Gallaudet Univer- 
sity accommodates 
the special needs of 
deaf students. Land- 
mark College m 
F^utney, Vermont. ha»s 
modified its curricu- 
lum so that it can he 
mastered by dyslexic 
studt^nts 

The school, which 
opened in September 
1985. ofFers a precol- 
lege program to 
prepare students for 
college-level work 
and a two-year 
curriculum leading to 
an associate's degree 
m general studi>.s. Its 
modern campus was 
designed by Edward 
Durrell Stone to 
house Windham 
College, w^hich closed 
m 1978, 

This year. 115 stu- 
dents were enrolled 
m the precoUege pro- 
gram and 30 m the 
college division. The 
college program is 
certified by the State 
of Vermont Board of 
Higher Education 
and IS a candidate for 
accreditati' b^' the 
New Englan >- 
ciation of Scht .nd 
Colleges. 



Classes in the 
liberal arts program 
meet for five hours 
per week, rather than 
the traditional three. 
The additional class 
hours allow more 
time for course 
material to be pre- 
.sented. The instruc- 
tor can also use the 
additional time to 
help .students develop 
the study skills 
needed to a.ssimilate 
the course material. 

In addition to the 
regular load of three 
or four courses per se- 
mester, all students 
take a one-hour, one- 
to-one tutorial every 
other day with a 
faculty member who 
helps them with 
classwork, language 
skills, and organiza- 
tional skills. 





Unlike many of 
the special pro- 
grams for dyslexics 
that have sprun^^ up 
at colleges and uni- 
ve I's i 1 1 es t h ro ii ^ ho ii t 
the count rv, the 
Landmark pro^M ain 
forces students to 
master, rather than 
bypass, lan^ua^e 
skills essential to 
college work. All 
students are 
screened with a 
battery of tests 
before being admit- 
ted to the college, 
and those with 
extreme deficits in 
their language skills 
must first enter the 
precollege program 
to raise their skills 
to the twelfth-grade 
level. The screening 
process also ensures 
that they meet the 
definition of learn- 
ing-disabled stu- 
dents, with average 
to above-average 
academic abilities, 
performance diffi- 
culties, and high 
motivation to 
attempt the pro- 
gi'am. 

Once in the 
college program, 
however, stude its 
are not allowed to 
use any of the com- 
pensatory measures, 
such as taped books. 



note takers, oral ex- 
aminations, and 
scribes, permitted at 
most colleges with 
special programs tor 
dyslexic students 
Instead, they are 
(\\pected to de\elop 
the study skills nec- 
essarv to engage in 
college-level work 

Amy Russian, 
assistant to the 
president, says, 
"Our program is 
highly competitive 
with a standard 
curriculum that is 
not watered down. 
In order to maintain 
the caliber of the 
college and keep 
standards high, 
most students must 
first take the 
precollege program. 
Seventy percent of 
the precollege 
students have been 
accepted at other 
colleges, but have 
not yet been ac- 
cepted by our college 
program. 

"Our college stu- 
dents have learned 
reading techniques 
to minimize the 
transposition of 
letters,'' she says. 
"However, they still 
may read slowly, 
transpose words, 
have soelling 
problems, and find 
writing difficult." 

Russian explains 
that the basic work 
associated with 
leaching dyslexic 
students, such as 



overcoming letter 
reversals, is covered in 
the precollege program. 
In the college program, 
students learn more so- 
phisticated skills, such 
as writing summaries to 
help process textbook 
material, advanced 
note-taking techniques 
for organizing and as- 
srnilating classroom 
lecture material, creat- 
ing personal study 
guides, and organiza- 
tional tools for exposi- 
tory writing. 

Both the precollege 
and college programs 
are highly individual- 
ized, with an average 
class size of only six 
students. Individual 
meetings help prevent 
students from falling 
behind. All faculty have 
extensive training in the 
teaching of dyslexic 
students, in addition to 
their subject-matter 
training. 

"It IS a commonly 
held belief that dyslexia 
IS neurological," says 
Russian. "We can't cure 
it, but we can teach our 
students to function 
successfully in a rigor- 
ous academic environ- 
ment " 



ERIC 



